Statistical Translation Model Based On Source Syntax Structure
نویسندگان
چکیده
Syntax-based statistical translation model is proved to be better than phrasebased model, especially for language pairs with very different syntax structures, such as Chinese and English. In this talk I will introduce a serial of statistical translation models based on source syntax structure. The tree-based model uses the one best syntax tree for translation. The forest-based model uses a compact forest which encodes exponential number of syntax trees in a polynomial spaces and lead to better performance. The joint parsing and translation model produces source parse trees, using the source side of the translation rules instead of separate parsing rules, and generate translations on the target side simultaneously, which outperforms the forest-based model. Some extensions of these models are introduced also.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Discriminative Syntactic Model for Source Permutation via Tree Transduction
A major challenge in statistical machine translation is mitigating the word order differences between source and target strings. While reordering and lexical translation choices are often conducted in tandem, source string permutation prior to translation is attractive for studying reordering using hierarchical and syntactic structure. This work contributes an approach for learning source strin...
متن کاملA Direct Syntax-Driven Reordering Model for Phrase-Based Machine Translation
This paper presents a direct word reordering model with novel syntax-based features for statistical machine translation. Reordering models address the problem of reordering source language into the word order of the target language. IBM Models 3 through 5 have reordering components that use surface word information but very little context information to determine the traversal order of the sour...
متن کاملA Source Dependency Model for Statistical Machine Translation
In the formally syntax-based MT, a hierarchical tree generated by synchronous CFG rules associates the source sentence with the target sentence. In this paper, we propose a source dependency model to estimate the probability of the hierarchical tree generated in decoding. We develop this source dependency model from word-aligned corpus, without using any linguistically motivated parsing. Our ex...
متن کاملNiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers...
متن کامل